Welcome to spSAM’s documentation!¶

spSAM: 10X visium spot Split Align Map¶

In [ ]:
import spsam as ss
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

plt.rcParams['figure.figsize'] = (6, 6)

Load Data¶

Load anndata object intergrating 10X Visium data with scRNA-seq reference of cell types through cell2location package.

In [ ]:
adata = ss.read_h5ad('demo.h5ad')
In [ ]:
adata.obs.keys()
Out[ ]:
Index(['in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts',
       'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts',
       'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes',
       'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes',
       'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', '_indices',
       '_scvi_batch', '_scvi_labels', 'B cells', 'B-cell lineage',
       'Cycling cells', 'DC', 'ILC', 'Macrophages', 'Monocytes',
       'Plasma cells', 'T cells', 'pDC', 'clusters'],
      dtype='object')

Through adata.obs.keys() , we can see total 10 cell types ['B cells', 'B-cell lineage', 'Cycling cells', 'DC', 'ILC', 'Macrophages', 'Monocytes', 'Plasma cells', 'T cells', 'pDC'] were mapped to 10X Visium data.

Score Genes¶

This reproduces the approach in Seurat (Satija) and has been implemented for Scanpy by Davide Cittar.

Here we use 15 top-ranked hypoxia-associated genes, ['VEGFA', 'SLC2A1', 'PGAM1', 'ENO1', 'LDHA', 'TP11', 'P4HA1', 'MRPS1', 'CDKN3', 'ADM', 'NDRG1', 'TUBB6', 'ALDOA', 'MIF', 'ACOT7'], which are collectively considered to be hypoxia signature (Buffa signature) to assess hypoxia statu. Custom gene sets are also supported.

In [ ]:
hypoxia_gene_lt = ['VEGFA', 'SLC2A1', 'PGAM1', 'ENO1', 'LDHA', 'TP11', 'P4HA1', 'MRPS1', 'CDKN3', 'ADM', 'NDRG1', 'TUBB6', 'ALDOA', 'MIF', 'ACOT7']
ss.tl.score_genes(adata, gene_list=hypoxia_gene_lt, ctrl_size=len(hypoxia_gene_lt), score_name='hypoxia_score')
WARNING:root:genes are not in var_names and ignored: ['VEGFA', 'LDHA', 'TP11', 'MRPS1', 'ALDOA']
computing score 'hypoxia_score'
finished
hypoxia_score, score of gene set (adata.obs).
104 total control genes are used.

Plot hypoxia_score distribution

In [ ]:
ss.pl.violin(adata, 'hypoxia_score')
No description has been provided for this image

Show score in visium image

In [ ]:
ss.pl.spatial(adata, img_key='hires', color=['hypoxia_score'])
No description has been provided for this image

Additionally, you can use ss.pl.plot_3d_score() function to plot hypoxia_score in 3D spatial dimension of 10x visium image

In [ ]:
ss.pl.plot_3d_score(adata, 'hypoxia_score')

Score Lever¶

Divide spot into three parts based on the distribution of the score values.
lever1(default): hypoxia_score >= 95% distribution of the score values.
lever2(default): hypoxia_score >= 50% distribution of the score values.
background: hypoxia_score < 50% distribution of the score values.
If using other thresholds to divide spot, pass two values through (lever1, lever2) parameter, like ss.pp.score_lever(adata, 'hypoxia_score', lever1=0.8, lever2=0.6), then background part will set hypoxia_score < 80% automatically.

In [ ]:
ss.pp.score_lever(adata, 'hypoxia_score')
hypoxia_score median:0.4765155938955453; mean:0.4701082144745993; std:0.18736050562146309
hypoxia_score lever1 value:0.7705623546013465; lever2 value:0.4765155938955453
score lever definition finished, score_type key is added to adata.obs, use adata.obs_keys() to check
In [ ]:
adata.obs.head()
Out[ ]:
in_tissue array_row array_col sample n_genes_by_counts log1p_n_genes_by_counts total_counts log1p_total_counts pct_counts_in_top_50_genes pct_counts_in_top_100_genes ... DC ILC Macrophages Monocytes Plasma cells T cells pDC clusters hypoxia_score score_type
AAACCGGGTAGGTACC-1 1 42 28 LD 7324 8.899048 25961.0 10.164390 12.387812 16.944648 ... 1.111955 0.415085 1.012286 1.422574 0.854083 1.465931 0.665313 5 0.456343 background
AAACCGTTCGTCCAGG-1 1 52 42 LD 7850 8.968396 37458.0 10.531002 13.762080 18.930536 ... 1.016049 0.493862 1.254048 1.144248 1.351480 1.134662 0.931101 0 0.357551 background
AAACCTCATGAAGTTG-1 1 37 19 LD 3961 8.284504 7936.0 8.979291 12.449597 17.905746 ... 0.643218 0.307800 0.567228 0.544434 0.602757 0.724065 0.476145 2 0.857315 lever1
AAACGAGACGGTTGAT-1 1 35 79 LD 7348 8.902320 29413.0 10.289227 14.605787 19.542379 ... 1.120537 0.381241 1.447186 1.067380 0.951656 1.158090 0.526404 3 0.499969 lever2
AAACTGCTGGCTCCAA-1 1 45 67 LD 7902 8.974998 35196.0 10.468717 13.504375 18.172520 ... 1.273408 0.467712 1.253389 1.485285 1.305185 1.323902 0.894033 4 0.822195 lever1

5 rows × 31 columns

Use a color_map directory to plot dividing three parts into visium image.

In [ ]:
color_map_dt = {
    'lever1': 'red',
    'lever2': 'orange',
    'background': 'darkblue',
}
ss.pl.spatial(adata, img_key='hires', color='score_type', palette=color_map_dt)
No description has been provided for this image

Split¶

In this step, we use the DFS algorithm starting from the red spot to expand outward, with the expansion condition being that the adjacent point is either a red spot or a yellow spot, until it cannot extend any further or reaches the edge of the image. After traversal, the red spots will be clustered into clusters. In a cluster, if there is only one cluster of red spots, it is called an 'independent' type; if there are more than two clusters of red spots in a cluster, the cluster will be further divided into smaller units based on the boundaries of each red spot cluster, known as 'adjacent' type. A new column 'class' will be created in the cluster_df.

In [ ]:
cluster_df = ss.pp.find_lever_core(adata, 'hypoxia_score')
In [ ]:
cluster_df
Out[ ]:
row col score type cluster_idx class
0 37 19 0.857315 lever1 0 adjacent
1 37 17 1.219024 lever1 0 adjacent
2 37 15 0.697438 lever2 0 adjacent
3 36 16 0.661241 lever2 0 adjacent
4 36 18 0.774277 lever1 0 adjacent
... ... ... ... ... ... ...
272 59 65 0.535874 lever2 6 independent
273 61 65 0.540075 lever2 6 independent
274 60 60 0.602142 lever2 6 independent
275 61 59 0.533772 lever2 6 independent
276 29 29 0.781194 lever1 7 independent

277 rows × 6 columns

Plot 'adjacent' type spot cluster, 8 means show top8 clusters.

In [ ]:
ss.pl.plot_lever_core(cluster_df, 'adjacent', 8)

Plot 'independent' type spot cluster

In [ ]:
ss.pl.plot_lever_core(cluster_df, 'independent')

Align¶

Aligning with the red spots as reference points, merge all clustered units partitioned based on their distances to the red spots into one cluster.

In [ ]:
ss.pp.cycle_spot_count(cluster_df)

Next, show spot number in every cycle througn ss.pl.plot_cycle_bar(), cycles containing less than 10 spots were filtered.

In [ ]:
ss.pl.plot_cycle_bar(cluster_df)

Cycle 0 means core red cluster(lever1), it contains 55 spots.
Cycle 1 means yellow spot(lever2) that have a distance of 1 unit from cycle 0.
Cycle 2 means yellow spot(lever2) that have a distance of 2 units from cycle 0.
...
Cycle 5 means yellow spot(lever2) that have a distance of 5 units from cycle 0.

In [ ]:
cluster_df
Out[ ]:
row col score type cluster_idx class cycle
0 37 19 0.857315 lever1 0 adjacent 0.0
1 37 17 1.219024 lever1 0 adjacent 0.0
2 37 15 0.697438 lever2 0 adjacent 1.0
3 36 16 0.661241 lever2 0 adjacent 1.0
4 36 18 0.774277 lever1 0 adjacent 0.0
... ... ... ... ... ... ... ...
272 59 65 0.535874 lever2 6 independent 3.0
273 61 65 0.540075 lever2 6 independent 1.0
274 60 60 0.602142 lever2 6 independent 3.0
275 61 59 0.533772 lever2 6 independent 4.0
276 29 29 0.781194 lever1 7 independent 0.0

277 rows × 7 columns

Map¶

Map the expression levels of specified cell types to the corresponding cycles.

In [ ]:
cell_type_lt = ['B cells', 'B-cell lineage', 'Cycling cells', 'DC', 'ILC', 'Macrophages', 'Monocytes', 'Plasma cells', 'T cells', 'pDC']

The ss.pp.get_cell_abundance() function requires three input parameters: adata, cluster, and cell_type_lt, where the cell_type_lt cell type list can be customized but must be included in the adata.obs object.

In [ ]:
ss.pp.get_cell_abundance(adata, cluster_df, cell_type_lt)

Visualization of different cell abundance in different cycles.

In [ ]:
ss.pl.plot_cycle_abundance(cluster_df, cell_type_lt)

From the above bar graph, it can be observed that the expression levels of various immune cells increase with the number of cycles from 0 to 4, and slightly decrease at the 5th cycle. However, there are significant differences in expression levels among different cell types. For better comparison, the ss.pp.minmax_scaler() function can be used to map the expression levels of different cells to the range of 0-1.

In [ ]:
cluster_scale_df = ss.pp.minmax_scaler(cluster_df, cell_type_lt)
In [ ]:
ss.pl.plot_cycle_abundance(cluster_scale_df, cell_type_lt)

To represent cell abundance using a line graph, the ss.pl.plot_line_abundance() function can be used.

In [ ]:
ss.pl.plot_line_abundance(cluster_scale_df, cell_type_lt)
No description has been provided for this image

To view the expression level of a specific cell type, the ss.pl.plot_cell_abundance() function can be used

In [ ]:
ss.pl.plot_cell_abundance(cluster_scale_df, 'B cells')
WARNING:matplotlib.legend:No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No description has been provided for this image
In [ ]: